Logic Program Induction using MDL and MAP: An Application to Grammars

نویسنده

  • Kevin Ellis
چکیده

Probabilistic programs provide an appealing language for describing mental theories, because they are Turing complete: any computable process may be described as a program. Program induction is the problem of inferring theories, in the form of (probabilistic) programs, that describe some set of observations. Minimum Description Length, or MDL, is one common approach to program induction [11]. The MDL approach selects the hypothesis (program) such that the sum of the length of the program, along with the length of the data (observations), when encoded with the help of the program, is minimized. Exactly how the data is encoded depends upon the hypothesis; when MDL is used for program induction, the encoding of the data is typically some sort of certificate proving that the program outputs the observations. Instead of MDL, one may also take a Bayesian approach to program induction. This approach involves placing a prior upon programs and calculating, for a given program, the likelihood of the observations. Typically the prior penalizes longer programs [7, pg 385]. In many situations, such as polynomial curve fitting, the MDL and Bayesian approaches coincide: the MAP hypothesis and the hypothesis with minimal description length are the same [7, pg 392]. In this project, I explore the problem of evaluating candidate programs to explain some number of observations, using both the MDL and MAP criteria. I focus on grammar induction: inferring the process (probabilistic program) that generates phrases found within a corpus. Both MDL and MAP are used

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Alternating Regular Tree Grammars in the Framework of Lattice-Valued Logic

In this paper, two different ways of introducing alternation for lattice-valued (referred to as {L}valued)  regular tree grammars and {L}valued top-down tree automata are compared. One is the way which defines the alternating regular tree grammar, i.e., alternation is governed by the non-terminals of the grammar and the other is the way which combines state with alternation. The first way is ta...

متن کامل

MDL-Based Context-Free Graph Grammar Induction

We present an algorithm for the inference of context-free graph grammars from examples. The algorithm builds on an earlier system for frequent substructure discovery, and is biased toward grammars that minimize description length. Grammar features include recursion, variables and relationships. We present an illustrative example, demonstrate the algorithm’s ability to learn in the presence of n...

متن کامل

Bayesian Induction of Bracketing Inversion Transduction Grammars

We present a novel approach to learning phrasal inversion transduction grammars via Bayesian MAP (maximum a posteriori) or information-theoretic MDL (minimum description length) model optimization so as to incorporate simultaneously the choices of model structure as well as parameters. In comparison to most current SMT approaches, the model learns phrase translation lexicons that (a) do not req...

متن کامل

Inductive Program Synthesis as Induction of Context - Free Tree Grammars

We present an application of grammar induction in the domain of inductive program synthesis. Synthesis of recursive programs from input/output examples involves the solution of two subproblems: transforming examples into straightforward programs and folding straightforward programs into (a set of) recursive equations. In this paper we focus on the second part of the synthesis problem, which cor...

متن کامل

Unsupervised Grammar Inference Using the Minimum Description Length Principle

Context Free Grammars (CFGs) are widely used in programming language descriptions, natural language processing, compilers, and other areas of software engineering where there is a need for describing the syntactic structures of programs. Grammar inference (GI) is the induction of CFGs from sample programs and is a challenging problem. We describe an unsupervised GI approach which uses simplicit...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012